Cisco Networkers 1998

home *** CD-ROM | disk | FTP | other *** search

/ Cisco Networkers 1998 / Cisco Networkers 1998.iso / pc / talks / nwk02 / nwk02.eml < prev next >

Wrap

Text File | 1998-04-16 | 105.0 KB | 1,638 lines

<HTML> <HEAD> </HEAD> <BODY> Hi, my name is Phil Harris. I work for Cisco Systems in the Consulting Engineering Department. Today I'll be discussing the features and functions and performance of Cisco's range of routing products. This is a long-standing talk at Networkers, and over the years has been developing to contain all the latest information regarding our latest routing products. The agenda for today will be to look at some perception versus reality in terms of the features and concepts we have to think about when looking at performance from an overall network perspective. Performance is measured in raw figures. However, what we'll be looking at is how the network itself impacts performance, and when we need to look at the router's performance compared to the overall network as a system. We'll then look at the various layers of switching we can accomplish through routing and what layers of performance can be affected by this. We'll then look at the actual architecture of the routing platforms, and the various switching powers within these routers that enable us to move packets between interfaces. I'll then spend some time looking at features which affect performance, how we can optimize our network design, and then spend some time looking at troubleshooting, so we can find out why the networks we're using aren't performing as we expect them to. The biggest perception is that the bigger the router, the better, the faster, the more interfaces. Whereas, in fact, what we need to look at is the application of these routers in terms of where they're positioned in the network, and how they actually perform based on the media and the types of data and protocols we're actually transporting across them. Very often it's the media itself, the Ethernet connection or the serial wide area connection, that will be the biggest point of bottleneck in my network. Therefore, understanding where I need features and functionality, as opposed to just raw bandwidth and speed, is often necessary. So let's look at what we need, which platforms and which features will be applied in these parts of the network, and the types of performance we can expect in these areas. This graph shows us the various media characteristics of the type of interfaces commonly found on routers. We look at Ethernet, Fast Ethernet, all the way through to ATM and wide area circuits. It's based on the minimum valid frame size and the bandwidth of this media that we can determine the maximum number of packets per second we can expect to be delivered by a given media. The theoretical value is based on the formula, where if I divide the bandwidth of a media by the packet size I will get the theoretical performance. And the less, or the smaller packets, the less efficient, but more packets per second. In the example of Ethernet, it can be seen with 64-byte packets I should be able to accomplish 14,880 packets per second. But as can be seen on this graph, the average size of a packet on a network in an Ethernet example is usually between 256 bytes and 1024 bytes, giving me a range of performance anywhere between 4.5 down to around 1200 packets per second. Significantly different from the maximum packets per second I may be expecting to achieve with Ethernet. So it's understanding not only what the media characteristics and limitations are, but the size of the packets that the applications I am using are likely to deliver. In the example we have here, I have an analysis of a real network. As can be seen, I've looked at a network from a perspective of two campuses linked by an FDDI ring. As can be seen by looking at the size and the distribution of various-size packets from 64-byte packets, 512-byte packets, and 1280-byte packets, we can see that the entire network will be generating on the left-hand side roughly 2850 packets per second. Now if we apply some rule of distribution of the traffic assuming that 80% of the traffic will stay local to the campus and 20% will need to traverse the FDDI ring, it can be seen that only 570 packets per second is necessary to accommodate all the traffic that needs to be forwarded across these two locations. This is a very simplistic example, but the idea of this is to show that with right net - good network design and good application of routing platforms, we can easily meet the needs and the speed requirements of both the applications and the density of traffic in a given network. So now we understand the perception versus reality in terms of what performance means. Let's look at where performance is affected and how the routers themselves provide their switching capabilities. The definition of a switch is a device that processes and transfers data from one input interface to an output interface based on some rules. Maybe this is, at the lowest level, the addressing of the end-destination network station. As it goes higher into the upper layers, we need more processing and more ability to interrogate the information and therefore make our switching decision. Routing is an overhead - the concept of taking an incoming frame, extracting the packet information or the Layer-3 information, determining from this the destination network address, then looking up some routing table or lookup table which has been maintained by a routing protocol, deciding where the packet has to be forwarded on to, making any MAC header rewrite changes appropriate for the forwarding of the packet, and then processing this out of the output interface is the overhead that routing has to accomplish. This is significantly more than, say, a simple Layer-2 bridge or switch has to look at, where they just need to see the MAC-destination address and forward the packet without any manipulation of fields such as the Time To Live or the IP Checksum field in the header.  So now we understand what the overhead of routing is. Let's look at the various architectures and the platforms, and see how they accomplish this concept of switching. The biggest dilemma for most Cisco customers is the breadth of range of the numbers of routers in the various families of routers, and the various software options in terms of process switching, optimum switching, NetFlow switching, and where I should apply these and what benefits they give me, and also what potential performance effects they have as well. We'll first look at the Cisco low-end and mid-range architecture routers. In this I'm discussing anything between the 2500 series and the 4700 - so anywhere between the 2500s, the 3600s, and the 4000 series. Each of these class of routers have some basic intrinsic characteristics in terms of the things that make up a router. We have a CPU, which is connected to both memory, for the CPU's use, and also a bus. The bus is a - required to connect the various interface processes and shared memory which is used by both the interface processes and the CPU itself. The shared memory is divided between packet buffers and system buffers, and the CPU memory is divided between routing tables and switching caches. The switching caches is the area of discussion we'll have in most depth to understand the benefit, from a speed perspective, they give us. Looking at the processes for the 2500, we see we have a 20-megahertz 68040 processor with a shared RAM up to 2 megabits - 2 megabytes for incoming and outgoing packets, and up to 16 megabytes for routing tables, switching caches, configuration information, and so on. It should be noted that with only 16 megabytes of memory, the 2500 router is ideally placed where we have a smaller number of routes in my network. For example, connecting a 2500 directly to the Internet may not be appropriate, as the size of the routing tables on the Internet are much larger than can be stored in 16 megabytes of memory alone without the configuration information and other information that needs to be stored here. The 3600 has two flavors, the 3620 and the 3640. The 3620 uses the 80-megahertz RISC-IDT 4600. The 3640 utilizes a 133-megahertz RISC-IDT 4700. These are much faster processors and the RISC-based architecture of these processors makes the process of looking up information on routing tables, or switching caches, far more optimized and therefore faster. In terms of memory, we have up to 64 megabytes of dynamic RAM on the 3620, and up to 128 megabytes of dynamic RAM in the 3640. This area of memory is used for both data and packet memory. We have separate areas for configuration information up to 128 kilobytes on the 3640. The 4000's family basically is split between the 4000 and the 4500 and the 4700. The 4000 was the original platform based on the 40-megahertz Motorola 68030 processor. Again, this is not a RISC-based processor and is just the next member up from the 2500 processor we saw earlier. This is markedly different from the 4500 and the 4700, which are now based on RISC processors - the 4500 working at 100 megahertz and the 4700 at 133 megahertz. And again the maximum memory we can have in any of the 4000 family is 32 megabytes. Again, probably insufficient for directly connecting to a very large enterprise network or the Internet itself. Now we've looked at the physical components of the low-end and mid-range routers, let's see how packets and frames are processed by these platforms so they can make the switching decision and actually accomplish the task the router's been purchased for. We'll first look at process switching. Process switching is the slowest form of switching available on any router in terms of how it works and when it should be used. Process switching is not the normal default for most Layer-3 protocols, such as IP and IPX, but we'll use this as an example to start off with, so we understand the entire process, so when we look at the protocols that do need to be process switched, we can see how this is accomplished. An incoming frame is delivered by an interface processor across the bus into the packet buffers in the shared memory area. The CPU is now interrupted to acknowledge the fact a packet is now waiting to be processed. The router looks at the destination address in the header of the Layer-3 packet information. If it cannot find an entry in one of its switching caches, it now passes that packet processing into scheduled mode. This means the router will continue doing its normal functions of updating routing tables and other functions, and when it is the turn of processing packets on the scheduler, we will now look at processing this packet. Processing the packet means looking in the routing tables to find an entry in the routing tables that matches the destination network in the IP header or IPX header at Layer-3 with a packet that's just been arrived on the interface processor. Once I find this entry, I now should have the destination network with the outgoing interface. I now have to perform some manipulation of this information to create both a new MAC header and also figure out where I need to pass this packet. Once I have this information, I will now initialize my fast cache in the low and mid-range platforms. This is a process of looking at the information I've derived and I've now transposed onto the newly formed packet, and initializing a part of memory in the CPU memory with this information. This is an example of a fast switching cache. This is an AppleTalk example. And we can see that we have the destination network number, the outgoing interface for that network, and the new MAC header which will be prepended to the frame as it leaves the interface on the way out of the router. An important point to notice here is the cache version number, in this case, 195. The switch - the cache version number tells me the last time this cache was updated or the values were changed. Now with IOS version 10.3 - or sorry - earlier than 10.3, any single entry change in the cache caused the entire cache to be invalidated. And the entire cache now needs to be built up again through process switching. Process switching as we're discussing is a fairly slow process, and also takes up much CPU time. Therefore, after 10.3 and later, only partial-cache invalidation happens. This means if a given interface goes down, or a network-topology change occurs, only those changes will be deleted or invalidated from the cache. Now if there are major numbers of changes, again, the entire cache will be invalidated. The way that the cache itself is aged out is that every minute an aging process looks at 20%, or one-twentieth rather, of the number of entries in the cache and deletes the oldest one-twentieth number of entries in the cache. If for some reason the cache becomes very large, this will be more aggressive, and up to 25% of the entries will be deleted every minute. This means that more and more process switching will be accomplished. This means that there could be a performance impact if I have a lot of entries in my cache, or my memory is limited, or if there's lots of invalidation of entries. So the two things to look for is this version number to tell me, "Are there lots of changes happening in my cache?" - and then to decide are they through a routing table change, an interface flap, or just because of the size of entries in my route cache - or my fast cache rather. Now I've initialized the fast cache with the information. If we go back to the example before, I can now pass that information, as I said, onto the newly formed packet. The frame then leaves the outbound interface processor where the source of the MAC address now coming from the interface is now added to the packet and I finally do the CRC check or the sidekick redundancy check. So those things are done on the interface processor because they don't need to be bothered with the CPU. The CPU is required to look at the routing tables and the caches, not just the prepend information, as the outbound interface can do this for us.  If we look at fast switching now, after process switching, let's assume this packet has come in. And now there is an entry in the fast cache. Well again, when the packet reaches the packet buffers we'll interrupt the CPU. But because the CPU can find an entry in the fast cache, this packet will be dealt with immediately. So we don't need to go into the scheduling mode we saw before in process switching and wait for the CPU to be available to do switching. This time we immediately will deal with looking in the fast cache, deriving the new MAC header information, prepending this to the packet in the system - in the packet buffers, and we can then pass the newly formed frame and packet out of the outbound interface, again doing the same changes on the way out. And this process has happened much quicker with less CPU steps, and therefore is a much more optimized fashion of switching that we can use. Fast switching is the default for both IP and IPX, and other protocols as well, and therefore is the usual form of operation, unless for some reason we have turned fast switching off. Some of the reasons we may not decide to use fast switching are things like turning DEBUG on. The DEBUG function requires a CPU to look at each packet in turn. This means if I turn DEBUG on, I need to turn process switching on effectively. This is done automatically by the router, but it does mean that if I leave DEBUG turned on, all those packets that are being interrogated by the router will be process switched, giving us a very big performance impact. If we look at the process and fast switching performance figures, we can see the great range in difference of performance. If we look at the 2500 series, process switching is down to about 1800 packets per second. Even this is a fairly optimistic number. Whereas fast switching is anything up to about 6000 packets per second. Again, looking at the 4700, we have 11,000 packets per second for process switching, 50,000 packets per second for fast switching. Now this is another good opportunity to look at where we would use process switching. Some protocols simply cannot be fast switched because the information in the header is not constant as it is with maybe IP or IPX. An example of this would be X.25 packets. X.25 packets are always process switched. Therefore, if we have X.25 requirements in my network, I need to pick a router, or choose a router, that can meet the performance requirements of my X.25 network. Therefore, a device that has a higher processing switching capability would be required, such as the 4500 or 4700 routers. Now we've looked at the low to mid-range, let's look at the Cisco 7000 router family. This is a different architecture and has a host of different switching mechanisms available. First we will look at the 7000 architecture itself. The 7000 architecture is based on a bus called the CX bus. The CX bus works at 533 megabits per second. And to this I connect interface processors. I have my Ethernet interface processors, ATM interface processors, and so on. The 7000 router can accommodate up to four interface processors, and has a switch processing module and a route processing module. We'll see how these two processing modules operate in a few moments. If we look at the physical characteristics of the routers, the 7000 is based around the 25-megahertz, 68040 processor. It should be obvious by now that the speed of the processor and the type of the processor affects how fast the router can do process switching. Already it should be evident that the 7000 does not have a process switching-oriented processor, and we will see why that is in a few moments. As I said, 533 megabytes per second is the speed of the bus, and I can have up to five interface processors on the 7000 and three on the 7010 version of the 7000 router. As far as memory's concerned, I can have up to 64 megabytes of dynamic RAM for holding routing tables and caching tables, and up to 512 kilobytes of memory for my packet buffers. If I choose to insert the silicon switch engine, the other variety of the switch processor, I can have up to 2 megabytes of packet buffers in the silicon switch engine. These are the physical characteristics. Now look at how - let's look - now look at how packets and frames are processed by the 7000. First, let's look at the switching paths available. We have a number of switching paths available. Process switching as we saw before, fast switching which is the default for all protocols, autonomous switching which is available on the switch processor card, silicon switch processing, which is available if I have the silicon switch processing card option inserted in the router, RSP-based optimum switching if I've inserted the RSP 7000 processing card, and RSP 7000 NetFlow, again if I've inserted the RSP 7000 route processing card. Let's look at process switching. First we will look at how memory is carved up on these routing platforms. On the switch processor, my packet memory is divided between my silicon cache, my autonomous cache, and my packet buffers. The route processor has the dynamic RAM located, which is divided between my routing tables, my system buffers, and my fast cache. We'll now look at how a packet coming in or frame coming in from the interface processor is dealt with in process switching mode. The incoming packet is placed in the packet buffers, and again we interrogate the destination network. This again is done as an interrupted mode on the CPU. Initially the autonomous - sorry - the switch processor, or the silicon switch processor, will do this function, and will first check the silicon switch cache and the autonomous cache on the switch processor for an entry. If there is no entry this means this packet has not been seen before, or this packet cannot be autonomously switched, or silicon switched. In this case, the header of the packet will be passed across the system bus into the system buffers. Now again, the CPU on the route processor will be interrupted from whatever function it's processing at the time, and will interrogate the information in the destination field, first looking at the fast cache. If the entry for this destination network does not exist in the fast cache, the process will now go into scheduled mode as before. What will now happen is we will go to process switching. The entire contents of the packet now have to be copied into the system buffers. We will now look at the routing tables for a corresponding entry for this destination network, and go through the same process of initializing the fast cache. The fast cache now has the information that can be used to do switching of this packet in fast switching mode. Once we have this information, we will also now copy this information to your autonomous cache and the silicon switch cache, if these processes have been enabled for autonomous switching, or silicon switching, because I have a silicon switch processor. The packet is now passed back into the packet buffers with the new MAC header information and passed out of the outbound interface, again right through the CRC check on the way out. As can be seen, process switching on the 7000 series is a very laborious process, takes many, many steps, and therefore is very slow and cumbersome. And as the 7000 processor based on the Motorola chips set at 25 megahertz using the 68040 processor, we can see will suffer in process switching-type environments. The 7000 was never designed to accommodate highly intensive process switching environments such as X.25 or IBM tunnel-entry or exit points.  If we look at fast switching now, the packet again is placed in the packet buffers as it comes into the switch processor, or silicon switch processor card. Assuming no entry exists in either of these caches because maybe we have turned off autonomous switching, or silicon switching, or in a more likely example, I haven't turned these on. In general, in the 7000 series, fast switching is the default switching path. So if I do not enable autonomous switching, although I normally can, or silicon switching, when I've bought the silicon switch option, I won't actually get benefit from this card. So fast switching, again - The header information is copied across the system bus into the system buffers, I find the corresponding entry in the fast cache in interrupt mode now, so that the CPU on the route processor stops what it's doing, and then copy the packet header across the system buff - bus, back into the packet buffers. And now the packet is now forwarded out of the outbound interface. This is a much faster process than process switching, but as you can see, we've still had to copy the packet header across the system bus into the route processor, and then interrupt the route processor to find the entry in the fast cache. The 7000 was really designed to take advantage of the switch processor card and the silicon switch processor card. If we look at the autonomous switch example now, an incoming packet is placed in system - in the packet buffers on the switch processor card. The autonomous switching process now will look at the destination network and interrogate the autonomous switching cache. As we saw in the previous example, process switching will initialize the autonomous cache once it's processed its own information. So now I should be able to glean this information from the autonomous cache, make my switching decision, create my new MAC header rewrite information, and process the packet locally on the switch processor card without interrupting the route processor. This also means the packet never leaves the packet buffers on the switch processor card. The packet information is now created and the outbound packet interface is chosen, and we do the CRC check as the frame leaves the outbound interface. Silicon switch processing is exactly the same process. But with the hardware assist of the silicon switch engine, we can derive much faster performance figures than we can with simply autonomous switching. But in general the process is exactly the same. A packet is passed into packet buffers. The switch processor card, in this case a silicon switch processor card, interrogates the packet information, finds the entry in its own cache, makes the rewrite information available, and processes the packet out of the outbound interface. If we look at the performance differences between process switching and silicon switching we can see why the 7000 was really designed for both autonomous and silicon switching modes of operation. Process switching of 2500 packets per second, again, is fairly optimistic. In general it would be normal to see figures of 1000 packets per second or less in process switching mode. Compared to 271,000 packets per second for silicon switch processing, we can see that the 7000 has really been designed for this type of protocol application. So if the protocols - that - cannot be fast switched, we certainly cannot autonomous or silicon-switch these packets. So for X.25 packets, again as the example, the 7000 would not be a good platform to end or terminate my X.25 packets or switch my X.25 packets, unless maybe they were being carried by some other protocol such as X.25 over TCP/IP, in which case the IP would be fast switched, or autonomously switched, or silicon switched, and then I could derive the performance I require. Okay, we've looked at the 7000 series, now let's look at the 7200 and 7500 series. First we're looking at the architecture of these routers. The 7206 is based on a network processing engine, and we'll look at the various varieties available on this. We then have three PCI interfaces, each working at 200 megabits per second. The two main PCI buses can accommodate three interface processors each. The third PCI bus is reserved for a 100-megabits-per-second Fast Ethernet interface on an I/O controller card. This allows me to have an uplink to a LAN switch, or some other Fast Ethernet infrastructure, and have my other port adapters available for serial lines and other type of port adapters. The 7200 uses exactly the same port adapters as the 7500 family we will see in a few moments. If we look at the processing engines in the 7200s, we have the NPE 100, 150, and 200. The difference between these processors are the speed and type of the processor. The NPE 100 and 150 is based on the 150-megahertz Orion R4700 chip set, whereas the NPE 200 is a 200-megahertz R5000 RISC-based processor. In terms of dynamic RAM, I have up to 128 megabytes of memory, giving me plenty of memory available for things like large routing tables and large switching caches. On the NPE 150 and 200, I have an added one megabytes of SRAM or Static RAM. Static RAM is much faster in terms of accessing the memory and therefore can deliver much faster interprocessor switching capabilities if I need these. So if I'm using very fast interfaces such as Fast Ethernet, or ATM, or maybe HSSI, I would use the NPE 150 or 200 so I can make use of the Static RAM capabilities for fast packet transfer. The 7500 starts with the 7505 router. The architecture in this respect is somewhat similar to the 7000. We have a central bus, now termed the CY bus. The CY bus differs from the CX bus in the 7000 inasmuch as it works at twice the speed. It works at one gigabit per second. The way this is accomplished is to use the same clocking bus as the 7000, but to transfer twice as many words of data, two instead of one per CPU cycle, therefore doubling the amounts of throughput I can achieve on this bus. I have a route switch processor as my combined route processor and switch processor, and up to four interface processors which can be a traditional 7000 interface processors, or the new generation of Versatile Interface Processors we will speak about later. The 7507 is based on two CY buses and now two route switch processors. In the 7507, either the RSP2s or RSP4s can be used. It should be noted that although there are two RSP cards in the 7507, only one is ever used at a time, and the other one is used as a backup in case of failure on the primary or master RSP. As can be seen, I have up to five interface processors that are divided across the two buses. It should be noted the RSP touches both buses and therefore is responsible for arbitrating packets across both buses simultaneously. Again I can use both the traditional interface processors and the versatile interface processes. The 7513 architecture is exactly the same as the 7507, except I have many more interfaces now. Up to 11 interface processor cards can be inserted into the 7513 chassis, again using either the RSP2 or the RSP4 route switch processors in master and slave mode. An additional member of this family, which I've included in this presentation, is the route switch module which is incorporated in the Catalyst 5000 and 5500 family of workgroup switches or multi-layer switches. The RSN module is an RSP2 class router blade which can be inserted into the 5000 or 5500, and give direct routing capabilities on a per-VLAN basis without the requirement for external interface processors connecting to a work group or campus switch. This is a very optimized solution when I wish to terminate and route between VLANs in a switched environment. If we look at the - how the path of the packets in the route switch module work, initially packets will come from the standard switch interface processor cards like the Fast Ethernet or the 10-megabits-per-second switch interface processors. It will be flooded across the Catalyst 5000 bus until the supervisor engine on the Catalyst 5000 or 5500 dictates that the packets should be forwarded only to the RSM. When the RSM receives these packets, it deals with them exactly as it would from an outbound interface, or external interface, in terms of routing these exactly the same way as a standard router would do. So an RSM has all these same process switching modes, fast switching modes as we'll see on the RSP processors in a few moments. If we look at the various hardware components of the 7500 family, we see that the RSP2 is based on a 100-megahertz R4600 processor. The RSP4 is based on a 200-megahertz R5000-chip set, as we saw before with the 7200. I have up to 11 processors available, sorry, interface processor cards available on the 7513, five on the 7507, and four on the 7505. If we look at the memory architecture, I can have up to 128 megabytes of DRAM on the RSP1 and 2, and up to 256 megabytes of DRAM on my RSP4. This means extremely large routing tables and caches can be accommodated, and multiple caches can be accommodated on the RSP4. It should be noted at this point that when I talk about the memory use of - for caches, I have a separate autonomous, sorry, fast cache for IPX, AppleTalk, IP. And each of these data structures needs to be stored in dynamic RAM. So obviously the larger the amount of dynamic RAM available, the more routing caches, and the larger routing caches I can store in these memory structures. I mentioned before the Versatile Interface Processor card. This is a new generation of interface processor card that allows me to put a variable number - type of interface processor into the same slot on a router. Traditionally, a router would have an Ethernet or Token Ring interface on a given slot. Now I can mix and match the various types of interfaces on a per-slot basis with up to two port adapters being accommodated by a single versatile interface processor. The versatile interface processor also has a switch processor on the VIP itself. And we have packet memory here as well. And the idea here as we'll see in a few moments is that local switching decisions can be made on the VIP card without having to cross the bus into the RSP if this is available to us. We have a variety of types of VIPs available. The two VIPs that are most commonly utilized are the VIP2-40 and the VIP2-50. The VIP2-40 has up to two megabytes of SRAM and 32 megabytes of dynamic RAM for packet buffers and also for cache entries, and the VIP-250 has a much faster processor and up to 8 megabytes of SRAM. Other service adapters can also be inserted into the VIP card such as a compression agent which allows me to compress my data in hardware without the route switch processor having to be utilized to do this very CPU-intensive function. Let's now look at the RSP-based router switching paths. We have on the RSP-based devices - so the 7500s, the 7200s, and also the route switch module, process switching, file switching, optimum switching, NetFlow switching, distributed switching, and finally Cisco Express Forwarding. The way the memory is divided on the RSP is between dynamic RAM being used for system buffers, my routing tables and my various caches and forwarding-information tables, and my packet memory in SRAM being used for packet buffers. Let's look at process switching on the 7500 series, first of all. When the packet comes in from the interface processor, we store this in the packet memory in the SRAM. The CPU is interrupted and we look in the fast cache initially to see if an entry exists for this destination network. In this case it does not, so we pass the packet into the DRAM system buffers on the RSP. The CPU now goes into scheduled mode and now after completing all other tasks it was scheduled to perform, we'll look in the routing table to find the corresponding entry as we saw before on the other platforms. We will now initialize the fast cache, or whichever cache is the default for this type of protocol, make the new right MAC header rewrite information available, pass this packet into the packet buffers, and pass it out of the outbound interface. Now this took less steps than process switching in the 75, 70 - sorry - 7000 series, but as you can still see, we need to move packet information from one type of memory to another. For fast switching, when the packet comes into the packet buffers, we will now find an entry in the fast cache as the previous process-switching example has now initialized the fast cache. The packet stays in the packet buffers in SRAM. We find the entry in the fast cache, make the MAC header rewrite information available to the packet in packet buffers, and pass the packet out of the corresponding outbound interface - a much faster process, and no moving packets from one part of memory to another part of memory. Optimum switching as an enhancement has been made in the IOS code to optimize TCP/IP or IP switching. The optimum switching cache used a different lookup mechanism than the standard fast cache does to much quicker find the entries in the cache. So when an entry is being formed in the optimum cache by the same process switching procedure we saw before, we can now guarantee to find a lookup for a given destination network in four database lookup cycles. It uses an 8-bit lookup cycle, which means with a standard TCP/IP address of 32-bits, I should be able to find a corresponding match for this particular destination at work and as a maximum four lookup cycles. In general, we do route summarization and other forms of consolidating of routing information, so usually I will find an entry quicker than this anyway. Once I've found the information in the optimum cache, which has exactly the same information as a fast cache, I will then make the same MAC header rewrites available and pass the packet out of the outbound interface. The next form of switching we spoke about was NetFlow switching. This is a new paradigm in terms of switching information between a router's ports. As you may have noticed by now, routing before or switching paths before on the other platforms, has been based on the destination network information alone as the sole piece of information we correspond to the information in the routing table. With NetFlow, we now look at much more information. We'll see how this works now. Before we do, let's define what a NetFlow really means. It's a unidirectional sequence of packets between a given source and destination. So it's not necessarily a session between two devices, but the unidirectional flow of packets between two devices in a given direction. The things I need to do to characterize NetFlows - I look at the granularity, how far and how much information do I use to differentiate different flows. What starts and stops a flow, and how quickly will I age out this information in my cache. To get some information as to why this is important, a recent study on the Internet backbone showed that the average flow length is some 21 packets long and lasts an order of ten or 20 milliseconds. Therefore, they're very short in length. Things like DNS lookups can be seen as a flow, or a Web link to a Web host where I download just a packet of information or a page of information that moves to another Web host, again will be deemed as a flow of information. If we look at the granularity from a data perspective, if we look all the way up to the application layer, this is the layer we can look up to, to determine one flow from a different flow and we'll see how that matters in a few moments. In the IP header, the information we will use will be the destination IP address, the source IP address, and the protocol field. Is it a TCP, UDP, or IGMP type of packet? The slide here shows the various protocol numbers and the corresponding types of IP protocol that could be being used. If I'm using UDP as a transport, I will then use a source UDP port number and a destination UDP port number. The source number is always a random number greater than 1024. The destination UDP port number will be determined by the type of application or the type of device I'm connecting to. Common ones would be 53 for domain-name services, 69 for TFTP, or Trivial File Transfer Protocol, and maybe 161 for SNMP or the Simple Network Management Protocol. If my application is using TCP as a reliable transport mechanism, I will use the source and destination TCP port numbers. Again, the source number will be a random number greater than 1024 and the destination number will be a well-known socket number appropriate with the application. Some common ones to look out for will be such as number 80, which is World Wide Web traffic, or 23, which is Telnet, or 21, which is FTP. Now we've decided what can actually define a flow from one flow to another on an application basis, let's look at how the router will now determine when a flow starts and when a flow stops. The start of a NetFlow is usually based on the new entry being formed in the NetFlow cache and we'll see how this happens in a few moments. Stopping or aging out or deleting an entry in the NetFlow cache can happen through two methods, either using the protocol fields, like in TCP, or the fact that we age out the cache after a relatively short period of time. If we look at the initialization of a TCP/IP flow, this is based when a packet starts a session and the SYN flag, or the synchronization flag, is set by a starting device. In this example we have Bob and Jane. Bob wishes to connect to Jane's machine. So we send a packet between Bob's PC and Jane's PC with the SYN flag sets. That initial session is enough to enter an entry in the NetFlow cache. When Jane responds or Jane's machine responds with the SYN and the ACK flag set, again, this is the opposite but corresponding flow for this session. Now we have all the entry we need to start this NetFlow. The other example in UDP, where I don't have SYN flags, is to look at the fact that we've now started a NetFlow with a unique source and destination address going to either the same or a different destination network. As long as the application source port socket numbers are different, and destination socket numbers are different, we can determine this is a new entry and start a new entry in the cache. And this is for UDP because UDP has no concept of a connection-oriented SYN and acknowledge-type scenario to start the session. When the TCP/IP session finishes, we use the FIN flag in the option field to determine that we've decided this session needs to end and the TCP/IP session needs to be broken. Again, with the same example now, if Bob's PC wishes to terminate the session with Jane's PC, we can see that we've sent a packet with a FIN flag set. Jane's PC would then respond with the FIN and the ACK flag set. Again, this is all the information we need to empty this entry from the cache because that initial random source socket number will never again occur in the near-term future, and therefore we know that this session has finally finished and we can delete that entry from the cache. In again, UDP, we don't have the luxury of a FIN flag, so what we decide to do here is to much more aggressively age out entries in the cache. And typically, entries in the NetFlow cache are aged out after ten seconds. Again, if more memory is required by the NetFlow cache and therefore less is available to the overall system, we will start aggressively or more aggressively aging out these entries. And this can be tuned to a very small granularity of age out time. It should be noted, however, if we tune the age characteristics of the cache to too small a value, this means that as a session may continue after the entry and the cache has been deleted, we will need to go through process switching to re-initialize a NetFlow cache as you will see in a few moments, and therefore take the performance hit this could involve on the router. Let's look therefore how NetFlow switching is accomplished on this 7500 series. An incoming packet is now processed and passed into the packet buffers in the SRAM area. As we can see now, we look at many more values to determine what we should do with this packet. We'll look at the destination network, the source network, the source - sorry, the port type or the protocol type, in this case TCP, the destination port number, and the source port number. So in this case the destination port is WWW or port 80, and the source is 2112. If we look at the entry in the NetFlow cache, on first glance, it may seem we have the appropriate entry in the NetFlow cache. But as you'll see, if you look carefully at the source socket number, this is a different number. Therefore, this station, this same workstation, has connected to the same destination network but with a new World Wide Web page or new application session. So this is characterized as a new flow. If the entry does not exist in the NetFlow cache, I now look at my routing table through standard process switching. I find the information in my routing table for just the destination network and a MAC header rewrite portion, and I pass this information into the NetFlow cache. Before I do this, however, I will also pass this first packet against any access control list or queuing or accounting information I need to interrogate the packet to, and as long as the packet passes the access control lists, I will now use that information to generate a complete NetFlow cache entry. This does mean if a packet does not pass an access control list in the process switching path, I will not form a NetFlow cache. This is a mechanism that can significantly improve the performance of access control lists on devices running NetFlow switching. I will now enter the values in the NetFlow cache, pass the packet in packet buffers with the new MAC header rewrite information to the outbound interface, and perform the CRC check as the packet leaves the outbound interface. I now have an entry in the NetFlow cache, so a subsequent packet from the same flow coming in will match all five values with the entry in the NetFlow cache. I can very quickly make my NetFlow MAC header rewrite information changes and pass the packet out of the outbound interface. This is a very detailed slide that looks at the types of values of information I can derive from the NetFlow switching path accounting and statistics application, which is built into NetFlow switching. A vast array of information is available, such as how many flows per second, how many given flows with the type of protocol, and between which source and destination network devices. This allows me to very clearly characterize the types of traffic flows I have on a given network running NetFlow switching. To assimilate - to take this information and allow it to be assimilated into a network management package, or billing package, Cisco has defined a set of NetFlow collectors that allow me to export the information from a router into these export devices, such as RMON probes or the Cisco Netsys products to allow us then to analyze this information and derive the accounting or statistical information we need for the traffic in my network. This also means that the memory of the router does not need to be utilized to store this information, and this can be a very large amount of information in a very large enterprise network. Okay, that was NetFlow switching. The next type of switching I'd like to discuss is Cisco Express Forwarding. Cisco Express Forwarding is a very new type of switching mechanism we have on the high-end Cisco router platforms. The drivers behind the creation of CEF are many. But mainly they're looking at how we currently do things in the cache-based systems we've seen in the presentation up to now. Caching as a mechanism for finding information, and storing information, is optimized in networks where there are not too many entries. That can be fairly large but not too many entries, and where the entries don't have to change very often, or they're not new entries all the time. Because every time we either change an entry in the cache mechanism, or there is a new entry, we have to go through that initial process switching mode, which can slow down the CPU and become an overhead in the performance of the router. There's various other types of information that we may wish to derive from the packet as well, which is not so easily done in the cache-based mechanism. And also, the inability to do various forms of load showing on a per-packet basis in caching-based mechanisms like fast and optimum caching, sometimes makes optimization from a network design perspective hard. So Cisco Express Forwarding is an attempt to meet the requirements of all of these limitations we have in cache-based systems. The basic concepts of Cisco Express Forwarding is to build adjacencies with neighbors. These adjacencies are stored in an adjacency table, also to create a Forwarding Information Base. The forwarding information base tells my router which destination networks can be found with which adjacencies. So I have these two data structures. I have the forwarding information base and the adjacency table which are the two cornerstones of the Cisco Express Forwarding mechanism. One thing about Cisco Express Forwarding, it never requires the RSP and the 7500 to ever process packets in switching mode. This is because CEF, or Cisco Express Forwarding, is topology driven. We don't have to wait until traffic generates requirements to do switching before we enter information in caches. Therefore, the moment we turn the router on and the routing protocols and the adjacent neighbors start communicating with my router, I can build my forwarding information base and my adjacency table and that stays there until a routing topology or topology change occurs. Adjacencies are nodes if they - that are adjacent to a router in one Layer-2 hop. That means they're directly connected to an interface or a media which is one hop away from the given router. The adjacency table is populated by both routing table - sorry - routing protocol information, and also the ARP cache we have, where directly connected devices on broadcast media, such as Ethernet locally, use the ARP mechanism to announce their existence. We have various types of adjacency: normal adjacency for normally connected devices; null adjacencies, where basically we have set up adjacent devices within the router where we used to just throw away packets, like a bit bucket-type application. We have glean-type adjacencies where devices are locally connected to the router. We have punt-type adjacencies. This is where I've decided that in this case, this particular type of adjacency or this particular type of connection requires me to go to RSP-based switching because of some media-characteristic change I have to take care of which can't be accomplished by Cisco Express Forwarding. We then also have the incomplete type of adjacency where I don't have enough information necessarily available to make my MAC header rewrites, and again I will go back to standard process switching. These last two are the unusual two in terms of Cisco Express Forwarding, where punt and incomplete are not normally found on networks using CEF. Because the whole concept of CEF is, I decouple the process of doing a switching path process from the RSP which is doing my routing protocol applications and functions. If we look at the adjacency table, it contains - it's indexed by Layer-3 addresses and populated by the ARP table, or as I said earlier, the protocol running RSPF or EHRP or your other IGP or BGP routing protocol. I have my new MAC header rewrite information for that given adjacent node. I have the connecting interface, the physical interface on a router that gets me to the adjacent node, the NTU size of the packets going through that interface - for example, the maximum size of an Ethernet or the maximum size on a serial link - and a number of counters which tell me how many packets have been processed for that given adjacency. The Forwarding Information Base is a table which is based, again, indexed on IP addresses, and is using the same mtrie lookup mechanism we saw in optimum switching. Which means, again, I have the ability to find entries in my forwarding information base very quickly. The contents of the forwarding information base are ordered such that we have prefixes or destination networks pointing to adjacencies which then match with the entries in my adjacency tables. If we look at how Cisco Express Forwarding works, information derived from RSPF or as I said, any other EGP or IGP, is utilized to form my forwarding information base. As long as that information stays current, the forwarding information base information stays current. What I have the ability to do is then pass the forwarding information base information to all the various line cards if they are using the versatile interface processes in my 7500. This means this information can now be pre-calculated and then distributed throughout the 7500 chassis. The adjacency table, again, is formed by either the routing table or a mechanism such as ARP to derive the local MAC header rewrite and MTU information and connected interface type. So Cisco Express Forwarding, which is released with mainstream 12.0 code and is available in various RSP-based code sets, is a mechanism that will enhance the stability of my router because my RSP is not doing so much traffic demand-based application work, and also allows me to distribute this information throughout the VIP cards on my 7500. However, I can do this distribution function with both optimum, and fast, and NetFlow switching as well. With the VIP2-40s and the VIP2-50 cards, and in fact, most of the VIP processor cards that are available today, I can actually accomplish a very similar concept. Whereby the fast, or optimum, or NetFlow cache is pushed into that area of dynamic RAM we saw before on the VIP card - which means I copy the same cache information down to the VIP processes. This means, if a switching decision has to happen locally on a VIP, whereby the interface - input interface and output interface are on the same VIP, I don't need to take my packets up across the various buses to the RSP to be passed down to the same VIP card again. This switching mechanism will happen locally. Also if I'm passing packets between two VIPs, what I can now do is just use the actual packet memory on the RSP and not interrupt the CPU of the RSP to actually forward the packets between VIP cards. This means I can get significantly faster processing of packets from a packet-per-second perspective, because I don't have to interrupt the CPU. Each VIP card has its own CPU to handle these packets in the same way as the RSP would've done in the first place. It means I can get anything up to, with packet oversight, 120,000 packets per second per VIP - which means, on an overall system wide architecture, I can get very, very high packet-per-second figures in a distributed fashion using this mechanism. If we actually look at the various switching paths available to the 7500 family, process switching occurs at about 10,000 packets per second. Plus we can see, it moves anything up to, theoretically, a million packets per second with a fully distributed system. It's highly unlikely you could actually achieve this kind of figure because the types of traffic flows would almost suddenly preclude you going from always the same input and output VIP card. But it is theoretically possible. More often it's normal that we'll get figures in excess of 300,000, 400,000, 500,000 packets per second with a carefully designed network and a card - carefully laid-out 7500. If we compare CEF to the other optimum and NetFlow and fast switching, CEF works at the same speed as optimum switching. So in this case, on the RSP, I would get 275,000 packets per second and the same figures I would get on the standard VIP card of around 90,000 packets per second, for say, packet over SONET on a VIP2-40. Okay. We've looked at how all the routers now go through the process of switching packets in the various switching paths and the various hardware applications. We will now look at the things that can affect the performance of your network. The five things I've chosen here are queuing, compression, filtering, encryption, and accounting. There are four types of basic queuing mechanisms available on the router today. The default is First In, First Out queuing. We have priority queuing, custom queuing, and Weighted Fair Queuing. It should be noted at this point though, queuing is only a mechanism that needs to be utilized if I have congestion on a network. If there is no congestion, I don't need to have to worry about going through the process of queuing. So queuing is only a mechanism that needs to be adopted if I know I have congestion in my network. The first type of queuing I will mention is priority queuing. Priority queuing is a process of defining different types of traffic into different priority queues: low, normal, medium, and high-priority queue. In this case, any traffic in the high-priority queue will get complete bandwidth. Whereas anything in the lower-priority queues will have to wait until the high-priority queue is empty before they see any available bandwidth. This is okay in some environments, but is not generally recommended as priority queuing means that one application can deprive all other applications of any network bandwidth. A more fair mechanism is custom queuing. On a protocol basis I can assign up to 16 different custom queues, and configure a percentage of interface bandwidth per queue to allow for each different application type or protocol type. So I can say that SNA may get 20% of my bandwidth, TCP/IP gets 30%, and all my remaining protocols get 50%, or some other user-definable variation and distribution of the bandwidth available. This means that no one application can ever completely starve all other applications of bandwidth on the network, and I know how much bandwidth each application will get. Weighted Fair Queuing is another mechanism which by default is turned on for all interfaces at two megabits per second or slower. This is another fair queue mechanism which looks at the relative traffic types going across a given interface. It will fairly allocate the bandwidth based on the heavy or light loads that a given session is exhibiting on the network. In general, what it means is, heavier sessions get less bandwidth, allowing smaller sessions, like interactive sessions, to get more available bandwidth. In general, an FTP-type session is more elastic than a Telnet session or maybe a voice over IP session that requires very quick and reliable amounts of bandwidth available to it, and predicts the amounts of bandwidth to it, and cannot afford to be kept behind a very large FTP transfer, for example. So Weighted Fair Queuing is a mechanism that is on by default on IOS code 11.1 and later, and that allows you to have a very fair mechanism for distributing traffic across your various interface processes. Random Early Detection is not a queue-management mechanism but a congestion-avoidance mechanism. As I said right in the beginning of this part of the discussion, queue mechanisms are about dealing with congestion on my network. Random early detection is an attempt to preclude or prevent congestion from occurring in the first place. The way random early detection works is, is that I have the ability to set precedence using the type of service field in the IP header to classify traffic into a certain classification of traffic. And what I can then do is say that, if a network interface becomes congested, I will randomly discard given frames from a low-priority or low-precedence traffic stream. TCP/IP, for example, is an elastic protocol. That means I have a sliding window mechanism that means I can send more and more packets without having to wait for an acknowledgment from the destination device. If I start throwing away packets from that session, that window size will decrease in size. And eventually I will get down to the point where only one packet will be sent until I receive an acknowledgment. This mechanism alone will slow down the flow of less-important or low-precedence traffic streams and allow high-precedence traffic streams and things like RSVP multimedia for video conferencing, for example, to get as much bandwidth as they require, alleviating congestion before it occurs and therefore not having to worry so much about congestion management mechanisms in queuing. So random early discard, or random early detection rather, is a mechanism that will stop congestion happening and allow me to classify my traffic. Compression is another type of application which can seriously affect the performance of a router. Compression is a process of taking packets and looking for repeating sequences in the information stream and then replacing that with some smaller value of information that can be then reconstituted at the remote end back into the original data. The three types of compression we use are header compression, per interface line compression, or per virtual circuit payload compression - the supported WAN encapsulations for header compression of frame relay, PPP, and X.25, for line or link compression, PPP, HDLC, or LAPB, and for payload or per virtual circuit compression, frame relay, and X.25. If we look very quickly at the link type of compression, which is the most common type of compression used, the two types of compression algorithms we have are the STAC encryption - compression mechanism, and the Predictor compression mechanism. STAC is more appropriate for lower-speed circuits such as ISDN B channels, but is very, very CPU intensive - but tends to use less memory. This means if I'm using STAC as a compression algorithm, I should choose a platform that has a very fast-working processor like a 4500 or a 4700 processor. If I'm using Predictor, it uses less memory - sorry - it uses more memory, I apologize - it uses more memory but is less CPU intensive. So a smaller processor will be more applicable if I was using Predictor as my compression algorithm. The next thing I look at in terms of things that affect performance are access control lists. Access control lists are the mechanisms where I can decide what traffic can flow between which interfaces and to which destination interfaces. I can permit and deny traffic coming into my router and going out of my router. This means that packets need to be compared to the entries in the access control lists to make sure that if a packet is not allowed to pass through that interface, it can be discarded. Depending on how big my access control lists are, configured on the IOS, the command line interface, this can seriously affect the performance in my router, because every packet on a given interface which is pointing towards an access control list, needs to be compared against an access control list. And the more extensive I make that access control list, the more information needs to be checked against a given packet. So one of the rules of thumb to give us good performance for the access control lists is to order my access control lists so the most regularly found packets in my network are found very quickly in terms of the access control lists. And that means these entries will be right at the top of my list of access controls. If I can't get away with this and I have to have very complex, long access control lists, NetFlow switching is a very good mechanism to somewhat - allow me to use complex access control lists and not solve them, but not have too much of a performance hit, because only the very first packet in NetFlow is passed against the access control lists. Or subsequent packets are then forwarded normally because the first packet made it through the access control lists. So use access control lists. They're very useful but make sure they don't become an overbearing performance impact on your router, because a CPU has to do all this comparing and then just throw away packets it finally founds - finally finds cannot be forwarded to one of its outbound interfaces. Some enhancements to access control lists are really about when we can do fast or optimum switching depending on where the access control lists are pointing at and how extensive these access control lists are. It's pretty much fair to say that all access control lists, both simple and extended, can now be fast or optimum switched, rather than process switched as they were before. So in most cases, we don't have the switching impact, we do have the CPU overhead of comparing these packets to the access control lists. Encryption is another very CPU-intensive application. Encryption is a process of taking packets and randomizing sequence of regularly occurring packets or - sorry - information in the packet stream, so that a eavesdropper or some untrusted part of the network would never be able to extract the actual data from the data stream. This again requires various algorithms to be utilized to encrypt my traffic flow. In this case, what we need to look at is making sure that only the given destinations or the given interfaces where I need encryption, actually have encryption turned on, because this will reduce the impact from the CPU having to encrypt all packets on a given router. And we use the crypto-map function to allow us to assign given interfaces to a given encryption mechanism. One word on encryption before I go on to accounting, is we should be careful also when I combine compression and encryption. If we think about it, compression is the concept of taking random packets and making - sorry - encryption is the concept of taking, repeating packets and randomizing them. Compression is a concept of looking for repeating packets of information and then compressing them. Or the way the IOS works is to encrypt before I compress so all data that would hit the compression agent would be randomized. I would find no correspondingly repeating patterns, and therefore my compression ratio will be very low. Let's look at accounting for IP and IPX. Accounting is a very simple process of looking at a given interface and counting how many packets and how many bytes flow through an interface. But this process can take anything up to 30% performance away from fast and optimum switching. So if you have the requirements to do accounting, NetFlow switching for IP certainly is an alternative but doesn't have the performance overhead of standard IOS accounting. Okay, let's look at some optimized network design ideas that can help us with some performance issues. In this example with the distributed versus centralized server form, we can see that all of these central servers are located in the FDDI ring, which is connected to the rest of the devices, the hosts connecting to those servers by a very small 56-kilobit-per-second circuit. Now in some examples, this may be the only type of circuit which is available. Or in a banking or retail application, this may be the most cost-effective way of getting my devices to my servers. If this is a limitation, there's various things that can be achieved in load balancing that can actually allow me to make better use of the available circuits. Serial-line load balancing can be achieved in various way. Process switching allows me to do packet-by-packet load sharing so that from a given source and destination, each alternate packet will be load shared across a different circuit, if the cost of the circuit between the two source and destination routers is the same, and there are no external hops on one or the other of the load-balancing serial links. So as long as I have an equal cost path in two directions between a source and destination router, I can load balance across these and with process switching, this would be done on a per-packet basis. This is a very fair method and a very even method of distributing traffic, but I'm having to use process switching to accommodate this. Fast and optimum switching will do this on a per-destination basis. In this case, a source destination network connection will be load shared on each new or alternate destination network device. If this is the case, what will happen is, depending on the distribution of size of sessions, I may or may not get a fair and equal distribution of traffic across the serial links. For example, if one of these sessions is a Telnet session, and one of these sessions is an FTP session, the Telnet session will go down one link, and the FTP session will go down the other. It may be more appropriate to have both the Telnet and the FTP and all other Telnet and FTP, vying for the same bandwidth and allowing the statistical nature of the traffic to even-out the traffic forwarding. NetFlow switching allows me, still on a per-flow basis, load balancing. This is more fair than standard fast or optimum switching because each flow is roughly the same size or statistically the same size, so much like packet-base load sharing, I can get a better distribution of traffic across two load-balancing equal-cost paths between routers. If I'm using ISDN, there's two mechanisms I can use to aggregate my B channels. Cisco's own Dialer Load Threshold mechanism, which allows me to bundle together BRI channels into looking like one 128-kilobit-per-second pipe, and that's fast switched, or using PPP multi-link RFC 1990, which again, can now be fast switched to allow me to fast switch these packets across a 128-kilobit-per-second or greater bundle of ISDN, BRI, or PRI channels. So these are some of the things where I have intrinsic media restrictions that can allow me to use the IOS features to better optimize my network design and traffic forwarding. Let's look at a few troubleshooting things now that can help us define when I think my network should be performing okay, but I seem to have performance issues. There are various statistics that can be gathered from the interface processor cards as to how the network is performing or how the devices connected to the routers are performing. First of all we'll look at the buffers and queues. The first thing we will look at is the ignore counter. The ignore counter is simply when a packet has been sent to the router for some reason to be processed, and I have no method of placing this in an initial packet buffer. If we remember from the discussion of process and fast switching, the first thing we do is to pass the packet into the packet buffers in the shared memory area. If there's no packet buffers available in the shared memory area, then I cannot process this packet and I'll have to ignore the packet. So this means that if for some reason there are no longer enough interface processes available in my packet buffers. This is the reason I have ignores. This usually means that for some reason, I either have no way of emptying out those packet buffers quick enough because my processor can't deal with these quick enough or some other memory limitation is occurring. If this is the case, we need to look at whether we are having too slow a processor in this router and maybe we need to upgrade the processor in the router, or maybe something else is utilizing this memory that we're not aware of. Input drops are the next thing we'll look at. This is where a packet has successfully made it into the packet buffers, but now for some reason, I can't transfer this packet into system buffers. Now this could be quite an important problem. System buffers, as we may remember, is what I use to deal with packets that can be process switched. Now these could be normal data packets, like X.25 packets, or if I'm running DEBUG, IP packets. But moreover, they will be routing protocol packets like RIP and SAP for IPX and EIGRP or OSPF in IP networks. If I can't have a - I can't find a system buffer to place my incoming routing protocol packets, I could, theoretically, start causing network instability because I would not be receiving hellos and acknowledgments in my routing protocol, and I could deem that my topology of my network has changed if I do not receive enough updates or acknowledgments in a given period of time. So this is quite an important issue to deal with immediately, you see this. System buffers, as I said, are used for process switched packets and they can be tuned by network operators. Whereas packet buffers are not tunable by a command-line interface. We do not normally recommend that people tune the system buffers unless they're guided by the Technical Assistance Center or some other body that can look at why they're having these issues and make the appropriate changes. Normally, if this has happened, a subsequent release of the IOS may have fixed the bug or issue that causes this buffer starvation. And if you've left your buffer tuning as it was before, you will no longer be able to take the benefit of the new IOS changes and also the maximum buffer performance. The IOS has been designed to automatically adjust the size of these buffers in case one buffer may be starved by another to make sure there's an equal, fair allotment of buffers in the system buffer area. Output drops are the opposite to input drops. A packet in system buffers has no way of being placed into the packet buffers to be placed into an outbound interface queue. This means that this packet will now have to be dropped because I have no means of passing it out of the system buffers and I don't want to use up system buffer area memory, if I can help it. Again, this is probably caused by the same problem we had in the initial discussion about ignores.  Output drops. This is now where I have the packet successfully in my packet buffers, but for some reason I have no method of getting it out to my outbound interface. This is because either the input - sorry - the output queue on the interface is full or the media connected to the full is so busy I simply cannot pass the packet out. This can be done for various reasons. Normally, it will be looked at - this could be down to the fact the outbound interface media is congested. Maybe this is a very highly utilized Ethernet segment or a congested wide area link. Maybe some form of queuing or discard mechanism for a wide area network circuit would be appropriate. So I don't fill up my output queues on my outbound interfaces and I can flow my traffic, again through these interfaces. This again, will be quite serious, if again this was a routing protocol packet or high-priority packet for an application that had to have service delivery. There are various statistics that can be looked at. If I do this show controller cbus command on the CLI, I will now look at the size and number of buffers I have in the packet buffers. But again remember, these buffers are allocated when the CPU first turns on the router, the router's first booted up, and it looks at the number of interfaces of various types, and then allocates these buffers based on the MTU size of those interfaces. This can't be tuned. In fact, the only way you can really tune this is to not have the same amount of interfaces in that router. By changing the number of interfaces I will re-allocate how many buffers available in various buffer sizes. If I do show buffers I now look at these system buffers, and these buffers are tunable. But again, I would recommend that you either contact your local SE or the Cisco tech before you start playing and tuning with these buffers as they can have other serious effects on network and router performance. The interface statistics command, with the show interface, whatever the interface is - for example, show interface zero - shows me a lot of information that can again help me to see how my buffers are being used and how my queues are being used. We can see how many output queue drops we have, how many -sorry - how many input queue drops we have, how many ignores, and how many times we just don't have any buffers to place these packets into. There's no real hard-and-fast numbers as to when one number means something more than another number. The general rule of thumb is, if I have anything up to about 5% to 6% to 7% ignores compared to my total number of packets, this is probably acceptable. Anything up to 8% or 9% I would start at least thinking about what's happening on my network. Anything above this, I would start contacting either a network operation team or the TAC or my local SE to come in and find out what's really going on in my network and why I'm dropping or ignoring so many packets. These aren't hard-and-fast numbers, they're just a rule of thumb to give you some degree of piece of mind as to when to start worrying about a problem. Another very useful command is the show interface stat command. This is a non-documented command and you will not find it by using the help facilities within the router CLI. But the show interface stat command will tell you which interfaces are doing process switching, route caching, which means fast or optimum switching, or autonomous or silicon switching. If I think that all my packets should be fast switched but my performance is very poor, by typing this command I will see whether for some reason all my packets are being processed switched. For example, somebody's turned DEBUG on and left DEBUG running on the router, again forcing all those packets into the process switching path. Another interesting command is the show ip interface command. This will show me things like, "Do I have access control lists setup on this particular protocol. Am I using facilities such as autonomous switching?" Remember, autonomous switching is always available on a 7000, but by default, fast switching is turned on for IP and IPX and so on. Am I using accounting information? And am I using compression on say, TCP header information? So this command is very, very useful so I can understand when I'm doing features or turning on features that could affect performance on my router. So the things to look for if I'm finding network performance issues are, percentage of drops and ignores and I gave you some rules of thumb you can apply there - things that hog the CPU like encryption and compression, especially, for example, if my compression is trying to compress data that can't be compressed, like a bit image, or a GIF file, or a JPEG image that really - or a file that's already been compressed, that simply cannot be compressed anymore. The CPU will still attempt to compress it but come up with a very, very high CPU utilization with poor results. Making sure my configuration is correct so that if I can use autonomous switching I am - if I need to use process switching, I have a router platform that has the appropriate amount of processing performance. Stability of my network is also very important. Routing protocol updates need to be processed by the CPU. In this case, if I have a lot of route instability, or topology instability in my network, the router itself's performance will suffer from this. For example, because my cache isn't being validated very quickly and I have to again, go through process switching, in the example of optimum, NetFlow, and fast switching. So from a summary perspective, understand the performance requirements of your network, remembering the intrinsic media restrictions you have. Use a faster switching supported - the fastest supported switching path, be it autonomous switching or optimum switching, or silicon switching if you have the silicon switch engine on the 7000. On the 7500, look for distributed switching functions like distributed NetFlow and distributed optimum switching on the VIP cards that are going to give you, again, a performance improvement. Choose the appropriate router platform, and carefully implement these features such as compression, accounting, access control lists, that can really affect the performance of your network equipment. That's the conclusion of this presentation. I'll also recommend there are various other presentations along this theme that could be very useful for an overall network - enterprise network solution perspective. The presentation by Marcus Phipps on the performance and architecture of campus LAN switches, again goes through the same basic steps of understanding how the boxes work and how performance can be affected by various characteristics. Thank you. </BODY> </HTML>